Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining

نویسندگان

Dan Zhu

Xiao-Bai Li

Shuning Wu

چکیده

a r t i c l e i n f o Identity disclosure is one of the most serious privacy concerns in today's information age. A well-known method for protecting identity disclosure is k-anonymity. A dataset provides k-anonymity protection if the information for each individual in the dataset cannot be distinguished from at least k − 1 individuals whose information also appears in the dataset. There is a flaw in k-anonymity that would still allow an intruder to discern the confidential information of individuals in the anonymized data. To overcome this problem, we propose a data reconstruction approach to achieve k-anonymity protection in predictive data mining. In this approach, the potentially identifying attributes are first masked using aggregation (for numeric data) and swapping (for nominal data). A genetic algorithm technique is then applied to the masked data to find a good subset of it. This subset is then replicated to form the released dataset that satisfies the k-anonymity constraint. Data-mining technologies have enabled organizations to extract useful knowledge from the data in order to better understand and serve their customers, and to gain competitive advantages [6,21,26]. While successful business applications of data mining are encouraging, there are increasing concerns about invasions to the privacy of personal information. A survey by Time/CNN [16] revealed that 93% of respondents believed companies selling personal data should be required to gain permission from the individuals whose information is being shared. In another study [9], more than 70% of participants responded negatively to questions related to the secondary use of private information. Concern about privacy threats has caused data quality and integrity to deteriorate. According to [34], 82% of online users have refused to give personal information and 34% have lied when asked about their personal habits and preferences. This study deals with the conflict between privacy and data mining in organizational decision support. Organizations that use their customers' records in data-mining activities are obligated to take actions to protect the identities of the individuals involved. It has been demonstrated that personal identities cannot be adequately protected by simply removing identity attributes from released data. There has been extensive research in the area of statistical databases (SDBs) on how to protect individuals' sensitive data when providing summary statistical information. The privacy issue arises in SDBs when summary statistics are derived on very few individuals' data. In this case, releasing the summary statistics may result in …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identity Disclosure Protection: A Data Reconstruction Approach for Preserving Privacy in Data Mining

Identity disclosure is one of the most serious privacy concerns in today’s information age. A wellknow method for protecting identity disclosure is k-anonymity. A dataset provides k-anonymity protection if the information for each individual in the dataset cannot be distinguished from at least k – 1 individuals whose information also appears in the dataset. There is a flaw in kanonymity that wo...

متن کامل

A Data Reconstruction Approach for Identity Disclosure Protection

Identity disclosure is one of the most serious privacy concerns in today’s information age. A well-know method, called k-anonymity, has recently been proposed and used to protect identity disclosure. The k-anonymity approach, however, still allows a data intruder to discern the confidential information in the anonymized data. To overcome this problem, we propose a data reconstruction approach, ...

متن کامل

Identity Disclosure Protection in Dynamic Networks Using K

The data mining figures out accurate information for requesting user after the raw data is analyzed. Among lots of developments, data mining face hot issues on security, privacy and integrity. Data mining use one of the latest technique called privacy preserving data publishing (PPDP), which enforces security for the digital information provided by governments, corporations, companies and indiv...

متن کامل

Privacy Preserving Mechanism for Anonymizing Data Streams in Data Mining

The Access control mechanism avoids the unauthorized access of sensitive information. It protects the user information from the unauthorized access. The privacy protection mechanism is a much important concern in the case of sharing the sensitive information. The privacy protection mechanism provides better privacy for the sensitive information which is to be shared. The generally used privacy ...

متن کامل

Algorithm-irrelevant Privacy Protection Method Based on Randomization

Privacy preserving classification mining is one of the fast-growing subareas of data mining. The algorithm-related methods of privacy-preserving are designed for particular classification algorithm and couldn’t be used in other classification algorithms. To solve this problem, it proposes a new algorithm-irrelevant privacy protection method based on randomization. This method generates and open...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Decision Support Systems

دوره 48 شماره

صفحات -

تاریخ انتشار 2009

Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining

نویسندگان

چکیده

منابع مشابه

Identity Disclosure Protection: A Data Reconstruction Approach for Preserving Privacy in Data Mining

A Data Reconstruction Approach for Identity Disclosure Protection

Identity Disclosure Protection in Dynamic Networks Using K

Privacy Preserving Mechanism for Anonymizing Data Streams in Data Mining

Algorithm-irrelevant Privacy Protection Method Based on Randomization

عنوان ژورنال:

اشتراک گذاری